NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Exploiting ML Task Correlation in the Minimization of Capital Expense for GPU Data Centers

Subramaniyan, Srinivasan; Wang, Xiaorui (November 2025, The 44th IEEE International Performance Computing and Communications Conference (IPCCC 2025), Austin, Texas, November 2025.)

Efficiently scheduling ML training tasks in a GPU data center presents a significant research challenge. Existing solutions commonly schedule such tasks based on their demanded GPU utilization, but simply assume that the GPU utilization of each task can be approximated as a constant number (e.g., by using the peak value), even though the ML training tasks commonly have their GPU utilization varying significantly over time. Using a constant number to schedule tasks can result in an overestimation of the needed GPU count and, therefore, a high capital expense for GPU purchases. To address this, we design CorrGPU, a correlation-aware GPU scheduling algorithm that considers the utilization correlation among different tasks to minimize the number of needed GPUs in a data center. CorrGPU is designed based on a key observation from the analysis of real ML traces that different tasks do not have their GPU utilization peak at exactly the same time. As a result, if the correlations among tasks are considered in scheduling, more tasks can be scheduled onto the same GPUs, without extending the training duration beyond the desired due time. For a GPU data center to be constructed based on an estimated ML workload, CorrGPU can help the operators purchase a smaller number of GPUs, thus minimizing their capital expense. Our hardware testbed results demonstrate CorrGPU’s potential to reduce the number of GPUs needed. Our simulation results on real-world ML traces also show that CorrGPU outperforms several state-of-the-art solutions by reducing capital expense by 20.88%. This work was published in the 44th IEEE International Performance Computing and Communications Conference (IPCCC 2025) in November 2025. Our paper received the Best Paper Runner-up Award from IPCCC.
more » « less
Free, publicly-accessible full text available November 21, 2026
SEEB-GPU: Early-Exit Aware Scheduling and Batching for Edge GPU Inference

https://doi.org/10.1145/3769102.3772715

Subramaniyan, Srinivasan; Joshi, Rudra; Wang, Xiaorui; Brocanelli, Marco (December 2025, ACM)

Free, publicly-accessible full text available December 3, 2026
Power Capping of GPU Servers for Machine Learning Inference Optimization

Ma, Yuan; Subramaniyan, Srinivasan; Wang, Xiaorui (September 2025, The 54th International Conference on Parallel Processing (ICPP 2025), San Diego, California, September 2025.)

Free, publicly-accessible full text available September 8, 2026
Latency-guaranteed Co-location of Inference and Training for Reducing Data Center Expenses

Chen, Guoyu; Subramaniyan, Srinivasan; Wang, Xiaorui (July 2024, IEEE)

Today's data centers often need to run various machine learning (ML) applications with stringent SLO (Service-Level Objective) requirements, such as inference latency. To that end, data centers prefer to 1) over-provision the number of servers used for inference processing and 2) isolate them from other servers that run ML training, despite both use GPUs extensively, to minimize possible competition of computing resources. Those practices result in a low GPU utilization and thus a high capital expense. Hence, if training and inference jobs can be safely co-located on the same GPUs with explicit SLO guarantees, data centers could flexibly run fewer training jobs when an inference burst arrives and run more afterwards to increase GPU utilization, reducing their capital expenses. In this paper, we propose GPUColo, a two-tier co-location solution that provides explicit ML inference SLO guarantees for co-located GPUs. In the outer tier, we exploit GPU spatial sharing to dynamically adjust the percentage of active GPU threads allocated to spatially co-located inference and training processes, so that the inference latency can be guaranteed. Because spatial sharing can introduce considerable overheads and thus cannot be conducted at a fine time granularity, we design an inner tier that puts training jobs into periodic sleep, so that the inference jobs can quickly get more GPU resources for more prompt latency control. Our hardware testbed results show that GPUColo can precisely control the inference latency to the desired SLO, while maximizing the throughput of the training jobs co-located on the same GPUs. Our large-scale simulation with a 57-day real-world data center trace (6500 GPUs) also demonstrates that GPUColo enables latency-guaranteed inference and training co-location. Consequently, it allows 74.9% of GPUs to be saved for a much lower capital expense.
more » « less
Full Text Available
Gbit/s Non-Binary LDPC Decoders: High-Throughput using High-Level Specifications

https://doi.org/10.1109/FCCM48280.2020.00058

Ferraz, Oscar; Subramaniyan, Srinivasan; Wang, Guohui; Cavallaro, Joseph R.; Falcao, Gabriel; Purnaprajna, Madhura (May 2020, 2020 28th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM))

Full Text Available

Search for: All records